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ABSTRACT 

The prokaryotic immune system CRISPR/Cas (Clus- 
tered Regularly Interspaced Short Palindromic 
Repeats/CRISPR-associated genes) adapts to for- 
eign invaders by acquiring their short deoxyribonu- 
cleic acid (DNA) fragments as spacers, which guide 
subsequent interference to foreign nucleic acids 
based on sequence matching. The adaptation mech- 
anism avoiding acquiring 'self DNA fragments is 
poorly understood. In Haloarcula hispanica, we pre- 
viously showed that CRISPR adaptation requires be- 
ing primed by a pre-existing spacer partially match- 
ing the invader DNA. Here, we further demonstrate 
that flanking a fully-matched target sequence, a func- 
tional PAM (protospacer adjacent motif) is still re- 
quired to prime adaptation. Interestingly, interfer- 
ence utilizes only four PAM sequences, whereas 
adaptation-priming tolerates as many as 23 PAM se- 
quences. This relaxed PAM selectivity explains how 
adaptation-priming maximizes its tolerance of PAM 
mutations (that escape interference) while avoid- 
ing mis-targeting the spacer DNA within CRISPR lo- 
cus. We propose that the primed adaptation, which 
hitches and cooperates with the interference path- 
way, distinguishes target from non-target by CRISPR 
ribonucleic acid guidance and PAM recognition. 



INTRODUCTION 

CRISPR (Clustered Regularly Interspaced Short Palin- 
dromic Repeats) arrays are present in ~90% of archaeal and 
~40% of bacterial genomes (1,2). Each array consists of vir- 
tually identical repeats that are separated by variable virus- 
or plasmid-derived sequences, known as spacers (3-5). This 
special structure is frequently flanked by a gene operon en- 
coding a diverse combination of CRISPR-associated (Cas) 
proteins (6). These two components together comprise the 
prokaryotic adaptive immune system against invading ge- 



netic elements (7,8). Based on a 'polythetic' criterion, this 
diverse system has been classified into three major types (9). 

In Type I systems, the precursor CRISPR ribonucleic 
acid (pre-crRNA) is processed by a Cas endoribonucle- 
ase, Cas6 in most cases, into mature CRISPR RNA (cr- 
RNA) molecules (10-12). Each crRNA contains a spacer 
guide flanked by two repeat remnants known as 5'- and 3'- 
handles (10,12). Mature crRNAs are loaded into the Cas- 
cade (CRISPR-associated complex for antiviral defence) 
complex to perform invader deoxyribonucleic acid (DNA) 
surveillance (10,13,14). The multifunctional Cas3, which 
possesses ATPase, helicase and nuclease activities (15), is 
then recruited by the Cascade subunit(s), such as Csel from 
the Escherichia coli Type I-E system (16), to destruct the tar- 
get DNA (10,14). In contrast, the adaptation (or spacer ac- 
quisition) pathway, which shapes and updates the CRISPR 
memory of invader information, has been less characterized 
since the first report in Streptococcus thermophilus (7). Re- 
cent studies on the E. coli Type I-E system revealed two dif- 
ferent adaptation pathways, naive adaptation and priming 
adaptation (17, 18). Efficient naive adaptation has only been 
observed in Casl- and Cas2-overexpressing E. coli cells, 
in which new spacers were occasionally acquired from the 
chromosomal DNA (19). During priming adaptation, a pre- 
existing spacer directs efficient acquisition specifically from 
the invader DNA carrying a homologous sequence (17,20). 
The priming pathway allows interference to be restored to 
escape invaders (17). 

Similar to other immune systems, CRISPR requires a dis- 
crimination mechanism to tell the 'self DNA, such as the 
spacer DNA in the CRISPR cassette, from the 'non-self, 
such as the protospacer DNA from the invader. Such dis- 
crimination should take place during both interference and 
adaptation stages, otherwise autoimmunity may occur ei- 
ther directly or indirectly. It was recently reported that in 
Type I-E system, a fully matched target is interfered only 
when combined with one of four unchangeable PAM (pro- 
tospacer adjacent motif) sequences (21). Lacking PAM se- 
quences, spacers in the CRISPR locus are automatically 
denned as a 'non-target' for interference. Therefore, this is 
termed a 'target versus non-target' discrimination mecha- 
nism (21), in contrast to the 'self versus non-self mecha- 
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nism described for the Staphylococcus epidermidis Type III- 
A system (22). In the Type III- A system, the spacer DNA is 
protected by sensing the base pairing between the 5' -handle 
of the crRNA and the corresponding portion of its pre- 
ceding repeat, from which the S'-handle derives. In con- 
trast, the mechanism by which the CRISPR adaptation ma- 
chinery discriminates the 'self and 'non-self sequences is 
poorly understood. Our recent study of the Haloarcula his- 
panica Type I-B CRISPR provides clues by demonstrating 
the absence or inactivation of the naive adaptation path- 
way in this system, in which a priming process is essentially 
required (20). This adaptation is strictly restricted to the 
invader DNA carrying a 'familiar' sequence that could be 
recognized by the crRNA of a pre-existing spacer, thereby 
resulting in discriminative adaptation. However, it should 
be noted that, without an additional self-avoidance mech- 
anism to distinguish the spacer DNA during adaptation- 
priming, the chromosomal sequences within or around the 
CRISPR cassette could still be acquired as self- targeting 
spacers. Previous studies of priming adaptation revealed its 
insensitivity to PAM mutations flanking a target (17,20), 
which compromises the possibility of PAM authentication 
during priming. 

Type I-B systems in haloarchaea have been recently in- 
vestigated so far involving the priming adaptation (20), cr- 
RNA maturation (12,23) and interference (24,25) pathways. 
A plasmid-based invader assay has revealed the important 
role of PAM during target interference (25). In this study 
of the H. hispanica Type I-B system, we systemically mu- 
tated the tri-nucleotide PAM sequence of a fully matched 
target to determine its role during interference and espe- 
cially adaptation-priming. Our results revealed that H. his- 
panica Type I-B interference recognizes four specific PAM 
sequences, and surprisingly, in addition to these four se- 
quences, another 19 PAM variants are differently tolerated 
to elicit priming adaptation. It was demonstrated that PAM 
authentication, which strictly recognizes the —1, —2 and 
—3 nucleotides of a target (spacer-matching) sequence, is 
common to interference and adaptation-priming processes. 
Therefore, we propose that both adaptation and interfer- 
ence require the base pairing-independent PAM recognition 
and the base pairing-dependent crRNA guidance to exclude 
the spacer DNA and other 'self sequences. 



MATERIALS AND METHODS 

Strains and culturing conditions 

The H. hispanica strains used in this study are listed in Sup- 
plementary Table SI . The uracil auxotrophic (/TyrF-deleted) 
strain DF60 (26) and its derivatives were cultured at 37°C in 
AS-168 medium (per litre, 200 g NaCl, 20 g MgS0 4 -7H 2 0, 
2 g KC1, 3 g trisodium citrate, 1 g sodium glutamate, 50 mg 
FeS0 4 -7H 2 0, 0.36 mg MnC^H^O, 5 g Bacto Casamino 
Acids, 5 g yeast extract, pH 7.2) with uracil added at a con- 
centration of 50 mg/1. The strains transformed by pWL502 
or its derivatives were cultured in yeast extract-subtracted 
AS-168. 

The E. coli JM 1 09 used for cloning was cultured in Luria- 
Bertani medium. When needed, ampicillin was added to a 
final concentration of 100 mg/1. 



Plasmid challenge assay 

The target plasmids (listed in Supplementary Table SI) 
were constructed by cloning a sticky fragment into pWL502 
(27) predigested with BamHI and KpnI. The fragment 
contains a spacer-matching sequence preceded by a de- 
signed PAM sequence. In most cases, two different-sized 
oligonucleotides were annealed to generate this sticky frag- 
ment. The DNA fragment of the repeat-flanked target se- 
quence was amplified from the genomic CRISPR DNA 
with corresponding primers, and digested with BamHI 
and KpnI before cloning. To construct pR-TCTl and pR- 
TTC1, nucleotide substitutions were performed by poly- 
merase chain reaction (PCR) mutagenesis using a pGEM-T 
vector (pGEM-T Easy, Promega) carrying the wild-type re- 
peat sequence as the template. The corresponding oligonu- 
cleotides are listed in Supplementary Table S2. 

The plasmid challenge assay was performed by trans- 
forming these target plasmids into uracil auxotrophic 
DF60 cells according to the Halohandbook online protocol 
(http://www.haloarchaea.com/resources/halohandbook/ 
Halohandbook_2009_v7.2mds.pdf). Individual colonies 
were screened on yeast extract-subtracted AS-168 agar 
plates. For each CRISPR-interfered plasmid, three repli- 
cates were performed to evaluate the interference effect. 

Spacer acquisition assay 

Spacer acquisition assay against the target plasmids was 
performed as previously described (20) with a few modifica- 
tions. Briefly, for each target plasmid, at least three transfor- 
mant colonies were separately inoculated into yeast extract- 
subtracted AS-168 medium and cultured for at least 5 days 
to allow sufficient interaction between the CRISPR system 
and the target plasmid. The liquid cultures were centrifuged 
at 10 000 rpm for 1 min to collect the cells, which were then 
lysed in distilled water. For these samples, CRISPR expan- 
sion was monitored by PCR using primer pairs amplify- 
ing the leader-proximal end (ExTest-CAS2 and ExTest-SPl, 
which locate within cas2 and spacer 1, respectively) (Supple- 
mentary Table S2). 

CRISPR mutant construction 

To construct the CRISPR mutants S1 C " 1A and S1 C " 1A , a 
1294-bp CRISPR structure containing only one spacer 
(spacer 1) was first generated by bridge PCR. This struc- 
ture was then cloned into the pGEM-T vector and subjected 
to PCR mutagenesis. The mutated CRISPRs were subse- 
quently cloned into the suicide plasmid pHAR and used to 
replace the wild-type CRISPR through the pop-in-pop-out 
gene knockout strategy (26). 

RESULTS 

H. hispanica CRISPR recognizes four of 64 PAM variants 
for interference 

Recently, we reported the adaptation of H. hispanica Type 
I-B CRISPR to an invading virus or plasmid, in which the 
priming match between a pre-existing spacer and the in- 
vader DNA was strictly required (20). CRISPR interference 
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was not observed due to escape mutations within the spacer- 
matched sequence, i.e. protospacer. To exclusively investi- 
gate the role of PAM, we constructed a series of plasmids us- 
ing a target sequence (protospacer 1) that is fully matched by 
spacer 1 (Figure 1A). Two oligonucleotides were annealed 
to form a dsDNA fragment containing protospacer 1 and 
a preceding tri-nucleotide (at positions —1,-2 and —3) as 
the PAM sequence. The fragment with two sticky ends was 
cloned into restricted pWL502 (27) (carrying the selection 
marker gene pyrF) to generate the target plasmid. The pos- 
sible base composition of the PAM sequence was sampled 
exhaustively, yielding a total of 64 (4 x 4 x 4) different 
plasmids, each named in the format pNNNl, where 'NNN' 
represents the distinct PAM sequence and T represents the 
common protospacer 1. These plasmids were transformed 
into uracil auxotrophic H. hispanica DF60 (ApyrF) cells 
(26) under selection pressure. Given a bona fide PAM, the 
Cascade complex loaded with the crRNA of spacer 1 (si cr- 
RNA) should recognize the target DNA, form an R-loop 
structure (28) (Figure 1A), and recruit Cas3 for target in- 
terference (10), which will cause reduced plasmid transfor- 
mation efficiency. 

Although each plasmid carries the fully-targeted proto- 
spacer 1, varying interference effects were observed (Figure 
IB), suggesting that interference activity is tightly regulated 
by a PAM sensing event. Interestingly, a TT or CC din- 
ucleotide at the —3 and —2 positions appeared necessary 
for interference, and evidently reduced transformation ef- 
ficiency was observed only for target plasmids with TTC, 
TTT, TTG or CCC as the PAM sequence. This suggests 
that H. hispanica interference only recognizes these four 
PAM sequences, which we described as TIP (target inter- 
ference permissive). Actually, the four TIP sequences are 
not equally favoured by H. hispanica CRISPR, because the 
cells showed almost absolute resistance to pTTCl, slightly 
compromised resistance to pTTGl and more compromised 
to pTTTl and pCCCl (Figure 1C). Correspondingly, in 
our previous study, we observed numerous new spacers ac- 
quired from foreign sequences conservatively preceded by 
TTC (20). Notably, these TIP sequences are not only differ- 
ent from the four PAMs recognized by the Type I-E interfer- 
ence machinery (21), but also different from those adopted 
by the Haloferax volcanii Type I-B system (25), suggesting 
the divergently-evolved PAM selectivity between subtypes 
and/ or organisms. 

Adaptation-priming tolerates as many as 23 PAM sequences 

The majority of PAM mutations should block target in- 
terference because of the strict PAM selectivity. However, 
priming adaptation can counter these escape mutations by 
acquiring new spacers from the target-bearing DNA (17). 
Whether all types of PAM mutations can be tolerated to 
prime adaptation remains unknown. Therefore, for the 60 
plasmids that escaped CRISPR interference, a spacer acqui- 
sition assay was subsequently performed against their trans- 
formants after a 5-day cultivation. Specific primers were 
used to amplify the CRISPR leader end, and arrays with 
new spacers incorporated were expected to produce larger- 
sized PCR products (20). Strikingly, CRISPR adaptation 
was observed for nearly one third of these escape plasmids 



(Figure 2A), revealing 19 different PAM sequences that are 
recognized to prime adaptation, which we described as PAP 
(priming adaptation permissive). Correspondingly, the re- 
maining 41 tri-nucleotides were referred as PAIN (prim- 
ing adaptation and interference non-permissive) sequences. 
Notably, PAP sequences are not all equally favoured, be- 
cause faint expanded bands were observed for pTAAl, 
pCAGl , pCCGl and pCGC 1 , whereas for most of the other 
plasmids, evident CRISPR expansion was observed (Figure 
2A). 

The TIP sequences TTC, TTT, TTG and CCC are also 
PAP. For example, CRISPR expansion was similarly de- 
tected for pCCCl and pTTTl transformants (Figure 2A). 
However, for pTTCl and pTTGl, transformed colonies 
were rarely observed due to the extreme interference ef- 
fect, and even when some colonies were observed, the 
CRISPR activity may have been inactivated to survive the 
selection pressure. For example, spacer 1 deletion was ob- 
served for some pTTCl and pTTGl transformants (Sup- 
plementary Figure SI). To circumvent this barrier, we 
replaced protospacer 1 with a sequence that is partially 
matched by spacer 1 3 (Figure 2B), which we designated 'pro- 
tospacer 13 v' for its derivation from the halovirus HHPV- 
2 (20). As expected, interference was not observed to 
the modified plasmids pTTC13v and pTTG13v (data not 
shown), whereas adaptation was readily detected for their 
transformants (Figure 2C), indicating that TTC and TTG 
are also PAP sequences. This result also suggests that com- 
pared to interference, priming adaptation tolerates more 
crRNA-pro to spacer mismatches. Adaptation to the pro- 
tospacer 13v target combined with another TIP sequence 
(TTT), two PAP sequences (TCC and CTC) (20) and two 
PAIN sequences (AGC and ACC), was also respectively 
tested, which showed similar results to the protospacer 1- 
based assay (Figure 2C). 

It should be noted that neither interference nor adapta- 
tion was observed to the empty pWL502 (data not shown), 
and the engineered pNNNl target plasmids are completely 
the same except the designed PAM preceding protospacer 1, 
hence their different performance in interference and adap- 
tation assays clearly demonstrates the PAM selectivity of 
the interference and priming processes (summarized in Fig- 
ure 3). Interestingly, most (15 of 19) PAP sequences could 
result from one of the four TIP sequences through a sin- 
gle point mutation, suggesting that adaptation-priming has 
shaped its PAM selectivity to tolerate these point mutations 
that escape interference. However, sequences with a purine 
(A or G) at the —3 position are consistently PAIN, therefore 
a purine (A or G) mutation at this position can cause escape 
from both interference and priming adaptation. 

PAM authentication prevents interference and priming adap- 
tation occurring to the CRISPR locus 

Within the H. hispanica CRISPR cassette, the three repeat 
nucleotides immediately preceding each spacer are con- 
served AGC, which was identified as a PAIN sequence (Fig- 
ure 3). Hence both interference and priming adaptation are 
prevented from occurring to the spacer DNA. However, it 
could also be attributed to additional base pairing formed 
at the — I, —2 and —3 positions according to the 'self versus 
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Figure 1. Identification of functional PAM sequences for CRISPR-interference. (A) Diagram depicting the plasmid challenge assay to determine the 
effects of various PAM sequences on CRISPR-interference. Each target plasmid carries a sequence that is fully matched by spacer 1 (protospacerl). The tri- 
nucleotide PAM sequence located at the - 1 to -3 positions of protospacerl was exhaustively varied to generate 64 different sequences. Two oligonucleotides 
were annealed to form a sticky fragment containing protospacerl and the PAM sequence, which was inserted into BamHI- and KpnI-digested pWL502. 
The pWL502 plasmid carries a pyrF gene that is required for Haloarcula hispanica DF60 cells to grow under uracil-free selection pressure. Cascade loaded 
with the crRNA of spacer 1 (si crRNA) is expected to recognize protospacerl with a bona fide PAM and form an R-loop structure to initiate target 
destruction. The 5'- and 3'-handles of the si crRNA are derived from the spacer 1 -flanking repeats. (B) Interference was observed to targets flanked by 
four PAM sequences (TTT, TTC, TTG and CCC) but not to those by the other 60 tri-nucleotides. The 64 tri-nucleotide sequences are arranged according 
the -2 and -3 nucleotides. At the -1 position, N stands for A, T, G or C; B stands for nucleotide T, G or C; whereas D means nucleotide A, T or G. (C) 
Interference effects to target plasmids carrying TTT, TTC, TTG or CCC as the PAM sequence. Three replicates were performed for each plasmid, and the 
relative transformation rate was calculated against the control pWL502. 



non-self theory of the Type III- A system (22). By analysing 
the crRNA-PAM base pairing pattern for each possible 
PAM sequence (Figure 3), interference and adaptation- 
priming showed no dependence on this extended base pair- 
ing. Therefore, H. hispanica CRISPR discriminates target 
from non-target by authenticating the PAM sequence in- 
stead of by sensing the crRNA-PAM base pairing. 

A recent study reported a repeat binding protein that 
specifically binds to the CRISPR direct repeats (29), which 
may impede priming adaptation on the CRISPR DNA. 
Therefore, the spacer-flanking repeat sequences potentially 
provide additional self-protective mechanisms during prim- 



ing adaptation. To test this possibility, we constructed plas- 
mids containing a repeat sequence immediately preceding 
or following the protospacerl target (Figure 4). As ex- 
pected, neither interference nor adaptation was observed 
to pRl, because the preceding repeat provides AGC as a 
PAIN 'PAM'. When AGC were mutated to a TIP sequence 
(TTC) in pR-TTCl and mutated to a PAP sequence (TCT) 
in pR-TCTl, interference and priming adaptation were ob- 
served, respectively (Figure 4). On the other hand, com- 
pared to pAGCl and pTCTl, addition of a downstream 
repeat in pAGCl-R and pTCTl-R did not affect their per- 
formance in spacer acquisition assay (Figure 4). We previ- 
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Figure 2. Adaptation priming tolerates 23 PAM sequences. (A) Spacer acquisition assay performed to target plasmids with protospacerl preceded by 62 
different PAM sequences (TTC and TTG not included). For each plasmid, three independent transformant colonies were tested, and a representative 
result is shown. The wild-type CRISPR generates a ~200-bp PCR product, and larger-sized PCR products indicate that new spacers have been acquired 
causing expanded CRISPRs. (B) Scheme showing the provirus-derived sequence (protospacerl 3 v, framed) that is partially matched by spacer 13. (C) Spacer 
acquisition assay performed to target plasmids (pNNN13v) containing protospacerl 3v that is preceded by seven different PAM sequences, including TTC 
and TTG. 
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A AGrf AAA 1 ACA 1 ATA 1 AGT 4 AAT 1 ACT 1 ATT 1 AGG 4 AAG 1 ACG 1 ATG 1 AGC 7 AAC 5 ACC 5 ATC 5 
C CG/f CA/£ CC/f CTA 0 CGT 2 CAT 0 CCT 0 CTT 0 CGG? CAG° CCG° CTG? CGCf CAC? CCC? CTC 3 
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Base pairing pattern: 

01234567 
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Figure 3. Summary of the potentials of the 64 tri-nucleotides to serve as a functional PAM during interference and/or adaptation-priming. The 23 tri- 
nucleotides shown against grey are permissive for priming adaptation (PAP), of which TTC, TTG, TTT and CCC are also permissive for interference (TIP). 
The other 41 tri-nucleotides are permissive for neither interference nor priming adaptation (PAIN). The underlined PAP variants could result from one 
of the TIP sequences through a single point mutation. Each tri-nucleotide is labelled with a number varying from 0 to 7, and these numbers indicate the 
different base pairing patterns (shown at the bottom) which potentially occur between each PAM sequence and the crRNA 5' -handle nucleotides AGC. 
N in the crRNA-sense strand or the crRN A- complementary strand signifies a nucleotide that is not same to or not complementary to the corresponding 
5' -handle nucleotide. 



Nucleic Acids Research, 2014, Vol 42, No. 11 7231 



interference adaptation 



pRI 


| repeat | | protospacerl | 






pR-TCT1 


repeat | | protospacerl | 


_ 


+ 


pR-TTCI 


| repeat ■ protospacerl | 


+ 


ND 


pAGCI-R 


| | protospacerl | repeat | 






pTCT1-R 


| | protospacerl | repeat | 




+ 


□ AGC 


□ TCT ■ TTC 







Figure 4. Upstream repeat protects the spacer DNA by providing a PAIN 
'PAM' sequence (AGC). The plasmid pRI has an intact CRISPR repeat se- 
quence preceding protosapcerl and the last three repeat nucleotides AGC 
were mutated to TCT or TTC to generate pR-TCTl or pR-TTCI. For 
pAGCl-R and pTCTl-R, the protospacerl target is preceded by AGC and 
TCT, respectively, and followed by an intact repeat. According to Figure 3, 
AGC, TCT and TTC are PAIN (priming adaptation and interference non- 
permissive), PAP (priming adaptation permissive) and TIP (target inter- 
ference permissive) PAM sequences, respectively. Interference/ adaptation 
to the target plasmid was observed (+) or not observed (— ). ND, not de- 
termined. 

ously revealed that DNA sequences upstream and down- 
stream of the priming protospacer can both be acquired as 
new spacers, albeit with different strand bias and efficiency 
(20). By randomly selecting individual colonies showing 
an expanded CRISPR, we collected 32 new spacers from 
22 pTCTl-R colonies and 39 new spacers from 28 pR- 
TCT1 colonies (Supplementary Table S3). Protospacers, 
from which these new spacers derived, could locate both up- 
stream and downstream of the priming protospacerl (Sup- 
plementary Figure S2), and showed a preference pattern 
similar to that observed for HHPV-2 and pVS (20). It was 
suggested that given a PAP PAM sequence, spacer acquisi- 
tion from either side of the priming protospacer could not 
be impeded by a flanking repeat. It appears that for the 
spacer DNA, PAM authentication that recognizes its up- 
stream repeat nucleotides as a PAIN signal serves as the only 
self-protective mechanism. 

PAM authentication strictly recognizes nucleotides —1,-2 
and —3 of the target sequence 

The striking finding that adaptation-priming tolerates more 
than 20 PAM variants made us doubt whether this toler- 
ance actually derives from the relaxed PAM selectivity at 
the theoretical —1,-2 and —3 positions, or nucleotides next 
to these positions have been misrecognized as a portion 
of PAM. Given the latter possibility, some results in Fig- 
ure 2 A may be false positives. Because the canonical PAM 
of this system proves to be TTC (20), we designed target 
plasmids pGTT5, pATT5, pCTT5 and pTTT5, with proto- 
spacer5 that is fully matched by spacer5. If misrecognition 
could occur to the PAM-3 / -side nucleotide(s), the TTC nu- 
cleotides at the —2, —1 and +1 positions may be misrecog- 
nized as a permissive signal for both interference and prim- 
ing adaptation (Figure 5A). However, consistent with the 
prostospacerl -based assay, CRISPR interference was only 
observed to pTTT5 (Figure 5C), and adaptation observed 
for pCTT5 but not for pATT5 or pGTT5 (Figure 5A), in- 



dicating the + 1 nucleotide could not be misrecognized as a 
portion of PAM. We hypothesized that crRNA base pair- 
ing at the +1 position may have prevented this misrecog- 
nition. Therefore, we further designed plasmids pGTT4ms, 
pATT4ms, pCTT4ms and pTTT4ms ('ms' represents mis- 
match) with a modified protospacer4, whose first nucleotide 
A was substituted by C to introduce a mismatch (to the s4 
crRNA) at the +1 position (Figure 5B). The crRNA match- 
ing within the seed region (positions +1 to +10) has been 
reported essentially required for Type I-B interference (24), 
and consistently, pTTT4ms escaped CRISPR interference 
(Figure 5C). Adaptation to pATT4ms and pGTT4ms was 
consistently not observed (Figure 5B), indicating even with- 
out crRNA matching at +1 position, the +1 nucleotide can 
still not be misrecognized for PAM authentication. Curi- 
ously, priming adaptation to pCTT4ms was blocked, hence 
the crRNA matching at position +1 appears to be impor- 
tant; however, adaptation to pTTT4ms was not affected 
(Figure 5B). From Figure 2 A, we can see that TTT seemed 
a more favoured PAM for adaptation-priming, which may 
have compensated the +1 mismatch. 

By ruling out misrecognition at position +1, CCC should 
be a reliable PAP (and TIP) PAM during the pCCCl chal- 
lenge assay, because the immediate 5' upstream sequence 
of this PAM is a designed BamHI restriction site (5'- 
GGATCC-3 ), and no matter the —4 or even —5 nucleotide 
could be misrecognized as a portion of PAM, authentica- 
tion consistently occurred to a CCC tri-nucleotide (Figure 
6A). Accordingly, the failure to recognize CCA and CCT 
as PAP (Figure 2A) suggests that misrecognition of the —4 
nucleotide diagrammed in Figure 6A could not happen. We 
noted that crRNA-PAM matching at the —1 position oc- 
curred for pCCC 1 but not for pCCA 1 or pCCT 1 , so we con- 
structed two mutant CRISPRs, S1 C-1A and S1 C-1T , respec- 
tively carrying a C-to-A and a C-to-T repeat mutation at 
the — 1 position of the spacer 1 DNA (Supplementary Figure 
S3). These CRISPR mutants retained the adaptation phe- 
notype to pCCCl (Figure 6C), indicating that crRNA bio- 
genesis was not affected. Then we challenged the S1 C_1A and 
S1 C_1T cells with pCCAl and pCCTl, respectively, in which 
the additional — 1 base pairing was introduced by these re- 
peat mutations (Figure 6B). However, adaptation was still 
not observed (Figure 6C), indicating that the PAM-5 / -side 
nucleotide(s) can not be misrecognized for PAM authenti- 
cation, with or without a — 1 base pairing. 

From above, we conclude that for a priming protospacer, 
PAM authentication strictly recognizes its —1, —2 and —3 
nucleotides, and the nearby crRNA-target base pairing does 
not affect this recognition. This is consistent with the very 
recent finding that the Streptococcus pyogenes Cas9 protein 
recognizes PAM prior to R-loop formation (30). After all, 
identification of these 23 PAP sequences proved to be con- 
vincing. 

DISCUSSION 

The ability to discriminate 'self from 'non-self is essen- 
tial for every immune system. CRISPR-Cas serves as the 
only adaptive defence line in prokaryotes. Guided by cr- 
RNA molecules, CRISPR interference is directed to homol- 
ogous foreign DNA. However, the crRNA-encoding DNA 
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Figure 5. The +1 nucleotide was not recognized as a portion of PAM with (A) or without (B) a crRNA-target base pairing at this position. The pNTT5 and 
pNTT4ms ('ms' stands for mismatch) plasmids contain the protospacer sequence of spacer5 and spacer4, respectively, and their protospacer is preceded 
by a PAM sequence of ATT, GTT, CTT or TTT. The A-to-C substitution at the + 1 position of protospacer4 introduces a mismatch here while generating 
a TTC sequence at positions -2, —1 and +1. The crRNA of spacer4 or spacer5 is denoted as s4 or s5 crRNA. The 5'-handle nucleotides are shown in 
orange, the spacer and protospacer sequences are in blue and the designed PAM sequence is in red. If the + 1 nucleotide could be recognized as a portion of 
PAM, the underlined sequences may be misrecognized for PAM authentication. Lane Ms, dsDNA size markers. In panel (C), different interference effects 
to pTTT5 and pTTT4ms in DF60 cells are shown. Three replicates were performed for each plasmid, and the relative transformation rate was calculated 
against the control pWL502. 
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Figure 6. The —4 nucleotide was not recognized as a portion of PAM without (A) or with (B) an additional — 1 base pairing. Priming adaptation was 
observed to pCCCl, but not to pCCWl (see Figure 2A) (W is an A or T). DF60 cells encode a wild-type crRNA of spacer 1 (si crRNA), whereas the S1 C " 1A 
and S1 C " 1T mutant cells express variant si crRNA molecules (sl c " 1A and sl c " 1T crRNA), in which the 5^-handle carries a C-to-A or C-to-U mutation at 
the —1 position (indicated by an orange arrow). The S'-handle nucleotides are shown in orange, the spacer and protospacer sequences are in blue and the 
designed PAM sequence is in red. If the —4 nucleotide could be recognized as a portion of PAM, the underlined sequences may be misrecognized for PAM 
authentication. Panel (C) shows S1 C " 1A and S1 C " 1T CRISPRs could not adapt to pCCAl and pCCTl, respectively. Lane M, dsDNA size marker. 



(i.e. the spacer DNA) in the chromosome must be discrim- 
inatively protected. For this discrimination, two different 
mechanisms have been proposed, the 'self versus non-self 
mechanism for the Type III-A system (22), and the 'target 
versus non-target' mechanism for the Type I-E system (21). 
Our data demonstrate that similar to Type I-E, the Type I- 
B interference machinery in H. hispanica cells also adopts 
a 'target versus non-target' mechanism based on PAM au- 
thentication. The tri-nucleotide sequences TTC, TTG, TTT 
and CCC can separately serve as a functional PAM during 
interference, which we termed 'target interference permis- 
sive' or TIP sequences. In E. coli, PAM recognition occurs 
to the nucleotides on the crRNA-complementary strand 
(termed target interference motif or TIM) (16,31), whereas 
the corresponding mechanism for the Type I-B system re- 
mains to be further investigated, given their different Cas 
components. The E. coli Cascade consists of five different 
Cas proteins (Csel, Cse2, Cas7, Cas5 and Cas6e) in an un- 
even stoichiometry (1:2:6:1:1) (32). The Csel subunit is be- 
lieved essential for PAM sensing, because direct interac- 
tion was observed between its conserved LI loop and the 



PAM sequence (16). However, Csel is not common to other 
Type I subtypes. For example, the Type I-B Cascade com- 
prises Cas5, Cas6, Cas7 and probably the specialized Cas8b 
(12,20,23). Interestingly, the Type I-E Csel and Type I-B 
Cas8b are both large proteins and predicted to share sim- 
ilar domain organization (33). Moreover, previous studies 
on the Haloferax Type I-B systems revealed that Cas8b is 
not required for the in vivo stability of crRNA (12,23), sug- 
gesting that Cascade lacking the Cas8b subunit exists sta- 
bly, consistent with the observed Csel disassociation from 
E. coli Cascade at low concentrations (14). Thus, we infer 
that Cas8b probably monitors the PAM sequence in Type 
I-B systems. Interestingly, a previous study of the H. vol- 
canii Type I-B system reported that ACT, TAA, TAT, TAG 
and CAC could serve as a functional PAM for interfer- 
ence (25). However, in our assay, the H. hispanica CRISPR 
does not interfere plasmids carrying these PAM sequences. 
These two haloarchaeal Type I-B systems carry nearly iden- 
tical repeat sequences, whereas their Cas proteins are less 
conserved (Supplementary Figure S4), particularly Cas8b 
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Figure 7. An integrated target discrimination model for CRISPR interference and adaptation-priming. The Cascade complex may quickly scan large DNA 
molecules for a TIP (target interference permissive) or PAP (priming adaptation permissive) PAM sequence. This PAM recognition process would ignore 
sequences preceded by a PAIN (priming adaptation and interference non-permissive) tri-nucleotide, including the spacer DNA and some PAM-mutated 
invader targets. When a TIP or PAP PAM sequence is recognized, Cascade may utilize its RNA component (i.e. crRNA) to further examine the PAM- 
following sequence (i.e. protospacer) while forming the R-loop structure. PAM interaction and the crRNA-protospacer matching potential both affect the 
Cascade-target affinity, which may regulate the nuclease activity of the subsequently recruited Cas3. In the case of a fully-matched protospacer combined 
with a TIP PAM, interference and priming adaptation may both occur. When the PAM is mutated to a PAP sequence, or when some mismatches are 
introduced within protospacer, the Cas3 nuclease activity may be downregulated and only priming adaptation occurs. 



with an identity of 22.6%, which may underlie their differ- 
ent PAM selectivity. 

The interference target is predetermined by the invader- 
derived spacer sequences, which have been integrated into 
CRISPR arrays during adaptation. This indicates that in- 
discriminative adaptation would lead to self- targeting spac- 
ers, similar to those observed during naive adaptation in 
Casl- and Cas2-overexpressing E. coli cells (19). Therefore, 
the CRISPR adaptation machinery also requires a discrim- 
ination mechanism which has been elusive for years. Our 
recent study demonstrates that in the H. hispanica Type I- 
B system, a priming crRNA partially matching the invader 
DNA is essentially required for adaptation (20), suggest- 
ing that discriminative adaptation to foreign DNA may be 
achieved by this priming requirement. However, similar to 
the crRNA-guided interference, the crRNA-primed adap- 
tation also has to preclude the host spacer DNA. Although 
mutations in the PAM sequence have previously been shown 
tolerated during adaptation-priming (17,20), here, our data 
demonstrate that PAM authentication does occur, but with 
relaxed stringency. This authentication tolerates as many as 
23 PAM sequences, which we described as 'priming adap- 
tation permissive' or PAP. Moreover, we confirmed this re- 
laxed PAM selectivity by showing that PAM authentication 



is precisely positioned to the —1, —2 and —3 nucleotides of 
the priming protospacer. From Figure 3, the rules for a PAP 
PAM for the H. hispanica Type I-B CRISPR can be con- 
cluded as: (a) purines are not allowed at the —3 position; 
(b) T is favoured at the —2 and —3 positions; and (c) C is 
favoured at the —1 position. Within the CRISPR cassette, 
repeat nucleotides immediately preceding each spacer are 
consistently AGC, which is 'priming adaptation and inter- 
ference non-permissive' or PAIN according to rule (a). This 
PAIN sequence will be ignored during PAM recognition, 
thereby protecting the spacer DNA from interference and 
adaptation-priming. Our data demonstrate that these three 
repeat nucleotides preceding a spacer DNA serve as its only 
protective determinant, suggesting mutations at these posi- 
tions, particularly —3 and —2, can cause priming adaptation 
or even interference to the CRISPR DNA itself, and in this 
case, CRISPR immunity must be inactivated. Consistently, 
when the E. coli crRNA was mutated at these positions, in- 
terference to a TIP target was not observed for unknown 
reasons (21). 

A previous study has proposed a model for Cascade- 
mediated target DNA recognition (16). Based on that 
model and our data here, we propose an integrated 
target-recognition model for CRISPR interference and 
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adaptation-priming in Figure 7. We speculate that the Cas- 
cade complex may utilize its protein subunit(s) to scan DNA 
molecules for a permissive PAM (TIP or PAP), and this pro- 
cess could preclude 'self sequences preceded by a PAIN 
sequence, such as the spacer DNA. Once a TIP or PAP 
PAM is detected, Cascade may utilize its RNA component 
(i.e. crRNA) to further examine the spacer-matching poten- 
tial of the PAM-following sequence (i.e. protospacer). By 
these two mechanisms, interference and priming adaptation 
could be discriminatively directed to the target-bearing in- 
vader DNA, but not to the crRNA-encoding 'self DNA 
or other sequences. It should be noted that compared to in- 
terference, the adaptation-priming process seems to tolerate 
more PAM variations and more protospacer mutations. It 
has been reported that for a PAM- or proto spacer-mutated 
target that escapes interference, the E. coli Cascade binds 
with decreased affinity (14), suggesting Cascade possibly re- 
quires stronger target affinity to elicit interference than to 
prime adaptation, which may explain their different toler- 
ance. 

Combining our previous finding that the H. hispanica 
system strictly requires a priming process for adaptation 
(20), we propose that the base pairing-independent PAM 
recognition and base pairing-dependent crRNA guidance 
together provide reliable 'target versus non-target' discrim- 
inations for CRISPR interference and adaptation path- 
ways in this system. The relaxed PAM selectivity during 
the adaptation-priming process explains how this process 
maximizes its tolerance of PAM mutations of a target 
(that escape interference), while avoiding mis-targeting the 
spacer DNA. Though details may differ, these discrimina- 
tion mechanisms may function similarly for other CRISPR 
systems where adaptation strictly requires being primed. 
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